智能论文笔记

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Alex Nichol , Heewoo Jun , Prafulla Dhariwal , Pamela Mishkin , Mark Chen

分类：计算机视觉 | 机器学习

2022-12-16

While recent work on text-conditional 3D object generation has shown promising results, the state-of-the-art methods typically require multiple GPU-hours to produce a single sample. This is in stark contrast to state-of-the-art generative image models, which produce samples in a number of seconds or minutes. In this paper, we explore an alternative method for 3D object generation which produces 3D models in only 1-2 minutes on a single GPU. Our method first generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model which conditions on the generated image. While our method still falls short of the state-of-the-art in terms of sample quality, it is one to two orders of magnitude faster to sample from, offering a practical trade-off for some use cases. We release our pre-trained point cloud diffusion models, as well as evaluation code and models, at https://github.com/openai/point-e.

translated by 谷歌翻译

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Alex Nichol , Prafulla Dhariwal , Aditya Ramesh , Pranav Shyam , Pamela Mishkin , Bob McGrew , Ilya Sutskever , Mark Chen

分类：计算机视觉 | 机器学习

2021-12-20

最近已被证明扩散模型产生高质量的合成图像，尤其是与指导技术配对，以促进忠诚的多样性。我们探索文本条件图像综合问题的扩散模型，并比较了两种不同的指导策略：剪辑指导和自由分类指导。我们发现后者是人类评估者的优选，用于光敏和标题相似度，并且通常产生光素质拟种样品。使用自由分类指导的35亿参数文本条件扩散模型的样本由人类评估者对来自Dall-E的人的人们青睐，即使后者使用昂贵的剪辑重新划分。此外，我们发现我们的模型可以进行微调，以执行图像修复，从而实现强大的文本驱动的图像编辑。我们在过滤的数据集中培训较小的模型，并在https://github.com/openai/glide-text2im释放代码和权重。

translated by 谷歌翻译

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark

分类：

2021-02-26

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.

translated by 谷歌翻译

Triadic Temporal Exponential Random Graph Models (TTERGM)

Yifan Huang , Clayton Barham , Eric Page , Pamela K Douglas

分类： (统计)机器学习

2022-11-29

Temporal exponential random graph models (TERGM) are powerful statistical models that can be used to infer the temporal pattern of edge formation and elimination in complex networks (e.g., social networks). TERGMs can also be used in a generative capacity to predict longitudinal time series data in these evolving graphs. However, parameter estimation within this framework fails to capture many real-world properties of social networks, including: triadic relationships, small world characteristics, and social learning theories which could be used to constrain the probabilistic estimation of dyadic covariates. Here, we propose triadic temporal exponential random graph models (TTERGM) to fill this void, which includes these hierarchical network relationships within the graph model. We represent social network learning theory as an additional probability distribution that optimizes Markov chains in the graph vector space. The new parameters are then approximated via Monte Carlo maximum likelihood estimation. We show that our TTERGM model achieves improved fidelity and more accurate predictions compared to several benchmark methods on GitHub network data.

translated by 谷歌翻译

Autonomous social robot navigation in unknown urban environments using semantic segmentation

Sophie Buckeridge , Pamela Carreno-Medrano , Akansel Cousgun , Elizabeth Croft , Wesley P. Chan

分类：机器人

2022-08-25

对于在城市环境中导航的自主机器人，对于机器人而言，要保持在指定的旅行路径（即小径），并避免使用诸如草和花园床之类的区域，以确保安全和社会符合性考虑因素。本文为未知的城市环境提供了一种自主导航方法，该方法结合了语义分割和激光雷达数据的使用。所提出的方法使用分段的图像掩码创建环境的3D障碍物图，从中计算了人行道的边界。与现有方法相比，我们的方法不需要预先建造的地图，并提供了对安全区域的3D理解，从而使机器人能够计划通过人行道的任何路径。将我们的方法与仅使用LiDAR或仅使用语义分割的两种替代方案进行比较的实验表明，总体而言，我们所提出的方法在户外的成功率大于91％的成功率，并且在室内大于66％。我们的方法使机器人始终保持在安全的旅行道路上，并减少了碰撞数量。

translated by 谷歌翻译

Matching with AffNet based rectifications

Václav Vávra , Dmytro Mishkin , Jiří Matas

分类：计算机视觉

2022-07-29

我们考虑了与视图合成的重大视点变化下的两视图匹配的问题。我们提出了两种新颖的方法，将视图合成开销最小化。第一个名为denseaffnet，使用了affnet的密集仿射形状估计值，它允许其划分图像，仅使用单个仿射图对每个分区进行整流。第二个名为Depthaffnet，结合了深度图和仿射形状估算的信息，以生成不同图像分区的不同整体构图仿射图。Denseaffnet比最先进的速度快，并且在通用场景上更准确。Depthaffnet在包含大平面的场景上与最先进的状态相提并论。评估是在3个公共数据集上执行的-EVD数据集，强烈的观点更改数据集和IMC光仪数据集。

translated by 谷歌翻译

Estimación de áreas de cultivo mediante Deep Learning y programación convencional

Javier Caicedo , Pamela Acosta , Romel Pozo , Henry Guilcapi , Christian Mejia-Escobar

分类：计算机视觉

2022-07-25

人工智能使在各个领域的问题上实施了更准确，更有效的解决方案。在农业部门，主要需求之一是在始终了解农作物所占据或不占领的土地，以提高生产和盈利能力。传统的计算方法需要手动收集数据，并在现场亲自收集，从而导致较高的人工成本，执行时间和结果不准确。目前的工作提出了一种基于深度学习技术的新方法，该技术与常规编程相辅相成，以确定人口稠密和人口不足的作物区域的面积。我们认为作为案例研究是厄瓜多尔种植和收获甘蔗中最知名的公司之一。该策略结合了生成的对抗神经网络（GAN），该网络在天然和城市景观的航空照片数据集上进行了训练，以改善图像分辨率；卷积神经网络（CNN）在甘蔗地块的航空照片数据集上训练，以区分人口稠密的农作物区域；以及以百分比方式计算区域的标准图像处理模块。进行的实验表明，航空照片的质量有显着改善，以及人口稠密的农作物区域和未吞噬的作物区域之间的显着差异，因此，耕种和未经耕种的地区更准确。所提出的方法可以扩展到可能的害虫，杂草植被，动态作物发展以及定性和定量质量控制的检测。

translated by 谷歌翻译

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

Aaron Mishkin , Arda Sahiner , Mert Pilanci

分类：机器学习

2022-02-02

我们开发了快速算法和可靠软件，以凸出具有Relu激活功能的两层神经网络的凸优化。我们的工作利用了标准的重量罚款训练问题作为一组组-YELL_1 $调查的数据本地模型的凸重新印度，其中局部由多面体锥体约束强制执行。在零规范化的特殊情况下，我们表明此问题完全等同于凸“ Gated Relu”网络的不受约束的优化。对于非零正则化的问题，我们表明凸面式relu模型获得了RELU训练问题的数据依赖性近似范围。为了优化凸的重新制定，我们开发了一种加速的近端梯度方法和实用的增强拉格朗日求解器。我们表明，这些方法比针对非凸问题（例如SGD）和超越商业内部点求解器的标准训练启发式方法要快。在实验上，我们验证了我们的理论结果，探索组-ELL_1 $正则化路径，并对神经网络进行比例凸的优化，以在MNIST和CIFAR-10上进行图像分类。

translated by 谷歌翻译

DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Orest Kupyn , Volodymyr Budzan , Mykola Mykhailych , Dmytro Mishkin , Jiri Matas

分类：

2017-11-19

We present DeblurGAN, an end-to-end learned method for motion deblurring. The learning is based on a conditional GAN and the content loss . DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance. The quality of the deblurring model is also evaluated in a novel way on a real-world problem -object detection on (de-)blurred images. The method is 5 times faster than the closest competitor -Deep-Deblur [25]. We also introduce a novel method for generating synthetic motion blurred images from sharp ones, allowing realistic dataset augmentation.The model, code and the dataset are available at https://github.com/KupynOrest/DeblurGAN

translated by 谷歌翻译